Using Con dence Interval to Summarize the Evaluation Results: A Case Study
نویسنده
چکیده
Distributed Shared Memory(DSM) has gained popular acceptance by combining the scalability and low cost of distributed system with the ease of use of single address space. Many new hardware DSM and software DSM systems were proposed in recent years. In general, benchmarking is widely used to demonstrate the performance advantages of new systems. However, the common method used to summarize the measured results is the arithmetic mean of ratios, which is incorrect in some cases. Furthermore, many published papers showed a lot of data only, and did not summarize them e ectively, which confused users greatly. In fact, many users want to get a single number as conclusion, which was not provided in old evaluation methods. Therefore, a new data summarizing technique based on con dence interval is proposed in this paper. The new technique includes two data summarizing methods: (1) paired con dence interval method; (2) unpaired con dence interval method. With this new technique, we can say at some con dence that one system is better than others. On the other hand, with the help of con dence level, we propose to standardize the benchmarks used for evaluating DSM systems so that we can get a convincible result. Moreover, our new summarizing technique ts not only for evaluating DSM systems, but also for evaluating other systems, such as memory system and communication systems.
منابع مشابه
Evaluating Evaluation Measures with Worst-Case Confidence Interval Widths
IR evaluation measures are oen compared in terms of rank correlation between two system rankings, agreement with the users’ preferences, the swap method, and discriminative power. While we view the agreement with real users as the most important, this paper proposes to use the Worst-case Condence interval Width (WCW) curves to supplement it in test-collection environments. WCW is the worst-ca...
متن کاملCondence Sets for Some Partially Identied Parameters
In this paper, we re-visit the inference problem for interval identi ed parameters originally studied in Imbens and Manski (2004) and later extended in Stoye (2007). We establish a new con dence interval that is asymptotically valid under the same assumptions as in Stoye (2007). Like the con dence interval of Stoye (2007), our new con dence interval extends that of Imbens and Manski (2004) to a...
متن کاملOne-sided con&dence intervals in discrete distributions
One-sided con&dence intervals in the binomial, negative binomial, and Poisson distributions are considered. It is shown that the standard Wald interval su4ers from a serious systematic bias in the coverage and so does the one-sided score interval. Alternative con&dence intervals with better performance are considered. The coverage and length properties of the con&dence intervals are compared th...
متن کاملCon®dence Judgments by Actors and Observers
We report three experiments comparing con®dence judgments made by actors and by observers. In Experiment 1, actors generated qualitative answers (countries of the world) in a country-identi®cation task; in Experiment 2, actors generated quantitative answers (years) in a historical event-dating task. Both actors and observers indicated their con®dence in the actors' answers. Actors were signi®ca...
متن کاملCALT Revised October Interval Estimation using the Likelihood Function
The general properties of two commonly used methods of interval estimation for population parameters in physics are examined Both of these methods em ploy the likelihood function i Obtaining an interval by nding the points where the likelihood decreases from its maximum by some speci ed ratio ii Obtaining an interval by nding points corresponding to some speci ed fraction of the total integral ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1999